Realising Data-Centric Scientific Workflows with Provenance-Capturing on Data Lakes
نویسندگان
چکیده
Abstract Since their introduction by James Dixon in 2010, data lakes get more and attention, driven the promise of high reusability stored due to schema-on-read semantics. Building on this idea, several additional requirements were discussed literature improve general usability concept, like a central metadata catalog including all provenance information, an overarching governance, or integration with (high-performance) processing capabilities. Although necessity for logical physical organisation order meet those is widely recognized, no concrete guidelines are yet provided. The most common architecture implementing conceptual zone architecture, where assigned certain depending degree processing. This paper discusses how FAIR Digital Objects can be used novel approach organize lake based types instead zones, they abstract implementation, empower generic portable capabilities provenance-based approach.
منابع مشابه
Recording Actor Provenance Data in Scientific Workflows
The concept of “actor” provenance data – essentially data that a client or service actor may assert about itself regarding an interaction, is presented. Actor provenance data can be combined with assertions of interaction to enable better reasoning within a provenance system. The need for recording and maintaining actor provenance data is discussed, along with the description of an architecture...
متن کاملCapturing Interactive Data Transformation Operations Using Provenance Workflows
The ready availability of data is leading to the increased opportunity of their re-use for new applications and for analyses. Most of these data are not necessarily in the format users want, are usually heterogeneous, and highly dynamic, and this necessitates data transformation efforts to re-purpose them. Interactive data transformation (IDT) tools are becoming easily available to lower these ...
متن کاملAn Algebraic Approach for Data-Centric Scientific Workflows
Scientific workflows have emerged as a basic abstraction for structuring and executing scientific experiments in computational environments. In many situations, these workflows are computationally and data intensive, thus requiring execution in large-scale parallel computers. However, parallelization of scientific workflows remains low-level, ad-hoc and laborintensive, which makes it hard to ex...
متن کاملManaging Provenance in Scientific Workflows with ProvManager
Running scientific workflows in distributed environments is motivating the definition of provenance gathering approaches that are loosely coupled to the workflow systems. We have proposed a provenance gathering strategy that is independent from workflow system technology. This strategy has evolved into a provenance management system named ProvManager. The main principle is that each workflow ac...
متن کاملCollaborative Data-centric Workflows: Towards Knowledge centric workflows and Integrating Uncertain Data
The acquisition of data, in particular for scientific data, is more and more organized in complex processes that are captured by workflows. These workflows are often driven by ontologies. For example the collaborative application Spipoll [3] proposes to collect information about pollination in France. The users take pictures of insects on flowers, download them on the application and then ident...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Data intelligence
سال: 2022
ISSN: ['2096-7004', '2641-435X']
DOI: https://doi.org/10.1162/dint_a_00141